A Study of Translation Rule Classification for Syntax-based Statistical Machine Translation

نویسندگان

  • Hongfei Jiang
  • Sheng Li
  • Muyun Yang
  • Tiejun Zhao
چکیده

Recently, numerous statistical machine translation models which can utilize various kinds of translation rules are proposed. In these models, not only the conventional syntactic rules but also the non-syntactic rules can be applied. Even the pure phrase rules are includes in some of these models. Although the better performances are reported over the conventional phrase model and syntax model, the mixture of diversified rules still leaves much room for study. In this paper, we present a refined rule classification system. Based on this classification system, the rules are classified according to different standards, such as lexicalization level and generalization. Especially, we refresh the concepts of the structure reordering rules and the discontiguous phrase rules. This novel classification system may supports the SMT research community with some helpful references.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Improving Statistical Machine Translation using Lexicalized Rule Selection

This paper proposes a novel lexicalized approach for rule selection for syntax-based statistical machine translation (SMT). We build maximum entropy (MaxEnt) models which combine rich context information for selecting translation rules during decoding. We successfully integrate the MaxEnt-based rule selection models into the state-of-the-art syntax-based SMT model. Experiments show that our lex...

متن کامل

Fast Translation Rule Matching for Syntax-based Statistical Machine Translation

In a linguistically-motivated syntax-based translation system, the entire translation process is normally carried out in two steps, translation rule matching and target sentence decoding using the matched rules. Both steps are very timeconsuming due to the tremendous number of translation rules, the exhaustive search in translation rule matching and the complex nature of the translation task it...

متن کامل

Edinburgh's Syntax-Based Machine Translation Systems

We present the syntax-based string-totree statistical machine translation systems built for the WMT 2013 shared translation task. Systems were developed for four language pairs. We report on adapting parameters, targeted reduction of the tuning set, and post-evaluation experiments on rule binarization and preventing dropping of verbs.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009